Expressive Models and Comprehensive Benchmark for 2D Human Pose Estimation
نویسندگان
چکیده
In this work we consider the challenging task of articulated human pose estimation in monocular images. Most of current methods in this area [4, 8, 16, 14] are based on the pictorial structures model (PS) and are composed of unary terms modelling body part appearance and pairwise terms between adjacent body parts and/or joints capturing their preferred spatial arrangement. In this work we advance the state of the art in articulated human pose estimation in three ways. First, we argue that modeling part dependencies between non-adjacent body parts is important for effective pose estimation (cf. Fig. 1). We propose a model [10] that incorporates higher order information between body parts by defining a conditional model in which all parts are a-priori connected, but which becomes a tractable PS model once the mid-level features are observed. This allows to effectively model dependencies between non-adjacent parts and retains an exact and efficient inference procedure in a tree-based model. Second, we explore various types of appearance representations with the aim to improve the body part hypotheses [11]. We argue that in order to obtain effective part detectors it is necessary to leverage both the pose specific appearance of body parts and the joint appearance of part constellations. We show that the proposed appearance representations are complementary and a combination of the best performing appearance model paired with a flexible image-conditioned spatial model achieves the best result. Third, we introduce a novel benchmark “MPII Human Pose” [3] that makes a significant advance in terms of diversity and difficulty, a contribution that we feel is required for future developments in human body models. This comprehensive dataset was collected using an established taxonomy of over 800 human activities. The collected images cover a wider variety of human activities than previous datasets including various recreational, occupational, and householding activities. People are captured from a wider range of viewpoints. In addition we provide a rich set of labels including positions of body joints, full 3D torso and head orientation, occlusion labels for joints and body parts, and activity labels. With these annotations we perform a detailed analysis [3, 12] of the leading 2D human pose estimation and activity recognition methods to understand success and failure cases for established models.
منابع مشابه
Synthetic 3D Model-Based Object Class Detection and Pose Estimation. (Détection de Classes d'Objets et Estimation de leurs Poses à partir de Modèles 3D Synthétiques)
The present thesis describes 3D model-based approaches to object class detection and pose estimation on single 2D images. We introduce learning, detection and estimation steps adapted to the use of synthetically rendered training data with known 3D geometry. Most existing approaches recognize object classes for a particular viewpoint or combine classifiers for a few discrete views. By using CAD...
متن کاملMonocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision
We propose a CNN-based approach for 3D human body pose estimation from single RGB images, that addresses the issue of limited generalizability of models trained solely on the starkly limited publicly available 3D pose data. We propose novel CNN supervision techniques, using a regularization structure while training that extends the concept of multi-level skip connections, and leverage first and...
متن کاملLCR-Net++: Multi-person 2D and 3D Pose Detection in Natural Images
We propose an end-to-end architecture for joint 2D and 3D human pose estimation in natural images. Key to our approach is the generation and scoring of a number of pose proposals per image, which allows us to predict 2D and 3D poses of multiple people simultaneously. Hence, our approach does not require an approximate localization of the humans for initialization. Our Localization-Classificatio...
متن کاملFine-Grained Head Pose Estimation Without Keypoints
Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is ...
متن کاملGenerative 2D and 3D Human Pose Estimation with Vote Distributions
We address the problem of 2D and 3D human pose estimation using monocular camera information only. Generative approaches usually consist of two computationally demanding steps. First, different configurations of a complex 3D body model are projected into the image plane. Second, the projected synthetic person images and images of real persons are compared on a feature basis, like silhouettes or...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014